AITopics | team reward

Collaborating Authors

team reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Neural Information Processing SystemsFeb-17-2026, 20:00:59 GMT

The code is available in https://github.com/cathyhxh/DIFFER.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre:

Research Report > New Finding (0.93)
Overview (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Hierarchical Multi-Agent Skill Discovery Mingyu Yang 1, Yaodong Yang

Neural Information Processing SystemsFeb-16-2026, 23:00:18 GMT

Skill discovery has shown significant progress in unsupervised reinforcement learning.

artificial intelligence, individual skill, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

Yali Du, Lei Han, Meng Fang, Ji Liu, Tianhong Dai, Dacheng Tao

Neural Information Processing SystemsFeb-11-2026, 09:36:50 GMT

Without a centralized controller, each agent is responsible to collaborate with others on its own decision.

agent, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
Asia > China > Guangdong Province > Shenzhen (0.05)
Oceania > Australia (0.04)
(2 more...)

Industry: Leisure & Entertainment > Games (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

fb2fcd534b0ff3bbed73cc51df620323-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 05:22:24 GMT

agent, collision, junction, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

fb2fcd534b0ff3bbed73cc51df620323-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 05:22:05 GMT

agent, communication, tarmac, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

DIFFER:Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Neural Information Processing SystemsDec-27-2025, 03:48:10 GMT

Cooperative multi-agent reinforcement learning (MARL) is a challenging task, as agents must learn complex and diverse individual strategies from a shared team reward. However, existing methods struggle to distinguish and exploit important individual experiences, as they lack an effective way to decompose the team reward into individual rewards. To address this challenge, we propose DIFFER, a powerful theoretical framework for decomposing individual rewards to enable fair experience replay in MARL.By enforcing the invariance of network gradients, we establish a partial differential equation whose solution yields the underlying individual reward function. The individual TD-error can then be computed from the solved closed-form individual rewards, indicating the importance of each piece of experience in the learning task and guiding the training process. Our method elegantly achieves an equivalence to the original learning framework when individual experiences are homogeneous, while also adapting to achieve more muscular efficiency and fairness when diversity is observed.Our extensive experiments on popular benchmarks validate the effectiveness of our theory and method, demonstrating significant improvements in learning efficiency and fairness. Code is available in supplement material.

decomposing individual reward, fair experience replay, multi-agent reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.63)

Add feedback

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 23:43:14 GMT

A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL. We compare LIIR with a number of state-of-the-art MARL methods on battle games in StarCraft II. The results demonstrate the effectiveness of LIIR, and we show LIIR can assign each individual agent an insightful intrinsic reward per time step.

agent, learning individual intrinsic reward, liir, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

Nigam, Rupal, Parikh, Niket, Osooli, Hamid, Yuasa, Mikihisa, Heglund, Jacob, Tran, Huy T.

arXiv.org Artificial IntelligenceOct-21-2025

Abstract--Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner . Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc T eaming (GPA T), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting. Ad hoc teaming (AHT) is an open challenge for multi-agent systems, in which an autonomous agent must successfully coordinate with other unknown agents [1]. Consider a search-and-rescue mission where robots are deployed from different organizations and expected to cooperate with each other on the fly--these robots may have different biases in how they achieve a given objective (e.g., risky vs. risk-averse search) or have different capabilities (e.g., sensing vs. manipulation). Adapting to such differences would enable agents to effectively and autonomously complete tasks where the team is unknown prior to deployment.

artificial intelligence, learner, teammate, (17 more...)

arXiv.org Artificial Intelligence

2510.16187

Country:

North America > United States > Illinois > Champaign County > Urbana (0.05)
North America > United States > Illinois > Champaign County > Champaign (0.05)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning

Guo, Weiqi, Liu, Guanjun, Zhou, Ziyuan

arXiv.org Artificial IntelligenceOct-16-2025

Multi-Agent Deep Reinforcement Learning (MADRL) has shown potential for cooperative and competitive tasks such as autonomous driving and strategic gaming. However, models trained by MADRL are vulnerable to adversarial perturbations on states and actions. Therefore, it is essential to investigate the robustness of MADRL models from an attack perspective. Existing studies focus on either state-only attacks or action-only attacks, but do not consider how to effectively joint them. Simply combining state and action perturbations such as randomly perturbing states and actions does not exploit their potential synergistic effects. In this paper, we propose the State-Action Joint Attack (SAJA) framework that has a good synergistic effects. SAJA consists of two important phases: (1) In the state attack phase, a multi-step gradient ascent method utilizes both the actor network and the critic network to compute an adversarial state, and (2) in the action attack phase, based on the perturbed state, a second gradient ascent uses the critic network to craft the final adversarial action. Additionally, a heuristic regularizer measuring the distance between the perturbed actions and the original clean ones is added into the loss function to enhance the effectiveness of the critic's guidance. We evaluate SAJA in the Multi-Agent Particle Environment (MPE), demonstrating that (1) it outperforms and is more stealthy than state-only or action-only attacks, and (2) existing state or action defense methods cannot defend its attacks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.13262

Country: